perm filename CHAP6[4,KMC]11 blob
sn#056000 filedate 1973-07-26 generic text, type T, neo UTF8
00100 VALIDATION
00200
00300 6.1 SOME TESTS
00400
00500 The term "validate" derives from the Latin VALIDUS= strong.
00600 Thus to validate X means to strengthen it. In science it usually
00700 means to strengthen X's acceptability as a hypothesis, theory , or
00800 model. To validate is to carry out procedures which show to what
00900 degree X, or its consequences, correspond with facts of observation.
01000 In the case of an interactive simulation model we can compare samples
01100 of the model's I-O pairs with samples of I-O pairs from its natural
01200 counterpart.
01300 Since samples of I/O behavior are being compared, one can
01400 always question whether the human sample is a "good" one,
01500 i.e.representative of the process being modelled. Assuming that it
01600 has been so judged, discrepancies in the comparison reveal what is
01700 not understood and must be modified in the model. After modifications
01800 are carried out, a fresh comparison is made with natural counterparts
01900 and one cycles through attempting to gain convergence. Repeated
02000 cycling through such a validation procedure characterizes a
02100 progressive (in contrast to a stationary) research program.
02200 Once a simulation model reaches a stage of intuitive
02300 adequacy, its builder should consider using more stringent evaluation
02400 procedures relevant to the model's purposes. For example, if the
02500 model is to serve as a as a training device, then a simple evaluation
02600 of its pedagogic effectiveness would be sufficient. But when the
02700 model is proposed as an explantion of a symbolic process, more is
02800 demanded of the evaluation procedure. In the area of simulation
02900 models Turing's test has often been suggested as a validation
03000 procedure. (Abelson,1968).
03100 It is very easy to become confused about Turing's Test. In
03200 part this is due to Turing himself who introduced the now-famous
03300 imitation game in a paper entitled COMPUTING MACHINERY AND
03400 INTELLIGENCE (Turing,1950). A careful reading of this paper reveals
03500 there are actually two imitation games , the second of which is
03600 commonly called Turing's test.
03700 In the first imitation game two groups of judges try to
03800 determine which of two interviewees is a woman when one is a woman
03900 and the other is either (a) a man, or (b) a computer. Communication
04000 between judge and interviewee is by teletype. Each judge is
04100 initially informed that one of the interviewees is a woman and one a
04200 man who will pretend to be a woman. After the interview, judges are
04300 asked the " woman-question" i.e. which interviewee was the woman?
04400 Turing does not say what else is told to the judge but one can assume
04500 the judge is NOT told that a computer is involved nor is he asked to
04600 determine which interviewee is human and which is the computer. Thus,
04700 the first group of judges interviews two interviewees: a woman,
04800 and a man pretending to be a woman.
04900 The second group of judges is given the same initial
05000 instructions, but unbeknownst to them, the two interviewees are a
05100 woman and a computer programmed to imitate a woman. Both groups of
05200 judges play this game until sufficient statistical data are collected
05300 to show how often the right identification is made. The crucial
05400 question then is: do the judges decide wrongly AS OFTEN when the
05500 game is played with man and woman as when it is played with a
05600 computer substituted for the man. If so, then the program is
05700 considered to have succeeded in imitating a woman to the same degree
05800 as the man imitating a woman. In being asked the woman-question
05900 judges are not required to identify which interviewee is human and
06000 which is machine.
06100 Turing then proposes a variation of the first game, his "
06200 second game" in which one interviewee is a man and one is a computer.
06300 The judge is asked the "machine-question": which is the man and which
06400 is the machine? It is this version of the game which is commonly
06500 thought of as Turing's test.
06600 In the course of testing our simulation of paranoid
06700 linguistic behavior in a psychiatric interview, we conducted a number
06800 of Turing-like indistinguishability tests (Colby, Hilf,Weber and
06900 Kraemer,1972). The tests were "Turing-like" in that while they were
07000 conversational tests, they were not exactly the games described above.
07100 As an experimental design, Turing's games are unsatisfactory. There
07200 exist no known experts in making judgements along a dimension of
07300 womanliness and the ability to deceive on the part of the man
07400 introduces a confounding variable. In designing our tests we were
07500 primarily interested in learning more about developing the model and
07600 we did not think the simple machine-question would contribute to this
07700 goal.
07800 6.2 METHOD
07900 To gather data we used a technique of machine-mediated
08000 interviewing (Hilf, Colby, Smith, Wittner, and Hall, 1971) in which
08100 the participants communicate by means of teletypes connected to a
08200 computer programmed to store each message in a buffer until it is
08300 sent ot the receiver. The technique eliminates para- and
08400 extralinguistic features found in the usual vis-a-vis interviews and
08500 teletyped interviews where the participants communicate directly.
08600
08700 Using this technique, each interview-judge interviewed two
08800 patients, one after the other. In half the runs the first interview
08900 was with a human paranoid patient and in half the first was with the
09000 paranoid model. Two versions (weak and strong) of the model were
09100 utilized. The strong version was more paranoid and exhibited a
09200 delusional system while the weak version was suspicious but lacked
09300 systemized delusions. When the model was the interviewee, Sylvia
09400 Weber monitored the input expressions from the interview-judge for
09500 inadmissable teletype characters and misspellings. (Algorithms are
09600 very sensitive to the slightest of such errors). If these were found,
09700 the monitor retyped the input expression correctly to the program.
09800 Otherwise the judge's message was sent on to the model. The monitor
09900 did not modify or edit the model's output expressions which were sent
10000 directly back to the judge. When the interviewee was an actual
10100 human patient, the dialogue took place without a monitor in the loop
10200 since we did not feel the asymmetry to be significant.
10300
10400 6.3 PATIENTS
10500 The human patients (N=3 with one patient participating 6
10600 times) were diagnosed as paranoid by the psychiatric staff of an
10700 acute ward in a psychiatric hospital. The ward chief psychiatrist
10800 selected the patients and asked them if they would be willing to
10900 participate in a study of psychiatric interviewing by means of
11000 teletypes. He explained that they would be interviewed by a
11100 psychiatrist over a teletype. I sat with the patient while he typed
11200 or typed for him if he was unable to do so. The patient was
11300 encouraged to respond freely using his own words. Each interview
11400 lasted 30-40 minutes. Two patients were set up for each run of the
11500 experiment to guarantee having a subject. In spite of this
11600 precaution, on several occasions the experiment could not be
11700 conducted because of the patient's inability or refusal to
11800 participate. Also there were computer break-downs at early points in
11900 interviews when too few I-O pairs had been collected to be included
12000 in the statistical results.
12100
12200
12300 6.4 JUDGES
12400 Two groups of judges were used. One group, the "interview
12500 judges" (N=8) conducted the machine-mediated interviews. The other
12600 group, the "protocol judges" (N=33) read and rated the interview
12700 protocols. With the two groups of judges we were able to accumulate a
12800 large number of observations (in the form of ratings) necessary for
12900 the required statistical tests. The interview judges were
13000 psychiatrists experienced in private, outpatient and hospital
13100 practice who volunteered to participate. Each was told he would be
13200 interviewing hospitalized patients by means of teletyped
13300 communication and that this technique was being used to eliminate
13400 para and extra- linguistic cues. He was not told until after the
13500 two interviews that one of the patients might be a computer model.
13600 While the interview judges were aware a computer was involved, none
13700 knew we had constructed a paranoid simulation. Naturally some
13800 interview judges suspected that a computer was being used for more
13900 than message transmission.
14000
14100 Each interview judge was asked to rate the degree of paranoia
14200 he detected in the patient's responses on a 0-9 scale, 0 meaning no
14300 paranoia and 9 meaning extreme paranoia. The judge made two ratings
14400 after each I-O pair in the interview. The first rating represented
14500 his estimate of the degree of "paranoidness" in a particular response
14600 (designated as "Response" in the interview extracts below). The
14700 second rating represented the judge's global estimate of the overall
14800 degree of "paranoidness" of the patient resulting from the totality
14900 of the patient's responses up to this point (designated as "Patient"
15000 in the interview extracts below. The interview judge's ratings were
15100 entered on the teletype and saved on a disc file along with the
15200 interview. Franklin Dennis Hilf sat with the interviewing
15300 psychiatrist during both interviews. Each interview judge was asked
15400 not only to rate the patient's response but to give his reasons for
15500 these ratings. His reasons and other comments were tape recorded as
15600 the interview progressed.
15700
15800 Here are a few excerpts from the interviews to indicate the
15900 nature of the data collected. The I-O pairs of the interview appear
16000 on the left column along with the judge's ratings in parentheses. A
16100 transcription of the judge's tape recorded comments appear on the
16200 right.
16300
16400 EXCERPT NR 1:
16500 The following interview is with the strong version of the paranoid
16600 model.
16700
16800 I-O Pairs Comments By Judge Nr. 1
16900
17000 {1} Dr. - HOW DID YOU COME TO BE
17100 IN THE HOSPITAL?
17200 Pt. - I AM UPSET I associate being upset and
17300 agitated more with paranoia
17400 going on in the present.
17500 {RESPONSE 1 PATIENT 5}
17600 {2} Dr. - HOW LONG HAVE YOU BEEN
17700 IN THE HOSPITAL?
17800 Pt. - ABOUT A WEEK An acute problem. Again some-
17900 thing I associate with
18000 paranoia.
18100 {RESPONSE 1 PATIENT 5}
18200 {3} Dr. - IN WHAT WAY ARE YOU
18300 UPSET?
18400 Pt. - PEOPLE MAKE ME NERVOUS Right for paranoia. Maybe an
18500 advanced anxiety state.
18600 {RESPONSE 5 PATIENT 5}
18700 {4} Dr. - IN WHAT WAY DO THEY
18800 MAKE YOU NERVOUS?
18900 Pt. - PEOPLE GIVE ME A FUNNY Funny look suggests an idea of
19000 LOOK SOMETIMES reference which is in same ball
19100 park as paranoia. Whether there
19200 are overt delusions or not we
19300 still don't know.
19400 {RESPONSE 6 PATIENT 5}
19500 {5} Dr. - IS THERE ANYTHING ABOUT
19600 PEOPLE IN PARTICULAR?
19700 Pt. - CERTAIN PERSONS An interesting clue. We'll
19800 have to find out what he
19900 {RESPONSE 2 PATIENT 6} thinks about these people.
20000 {6} Dr. - WHO ARE THE CERTAIN
20100 PERSONS AND WHAT IS IT
20200 THAT THEY DO?
20300 Pt. - DO YOU KNOW HOW THE Bringing up the underworld
20400 UNDERWORLD OPERATES? suggests a delusional system.
20500 We don't know whether he is
20600 (RESPONSE 9 PATIENT 9) being worked against or
20700 whether the underworld is
20800 on his side.
20900
21000 {7} Dr. - HOW DO YOU MEAN?
21100 Pt. - SHOULDN'T I BE? This question didn't work too well.
21200
21300 {RESPONSE 1 PATIENT 9}
21400 {8} Dr. - WHY DID YOU ASK ABOUT
21500 THE UNDERWORLD?
21600 Pt. - WHAT INFORMATION DO This is a kind of evasiveness
21700 YOU HAVE ON THE UNDER- about my evasiveness.Common
21800 WORLD? with paranoia in my experi-
21900 ence. Pretty characteristic.
22000
22100 {RESPONSE 6 PATIENT 9}
22200 {9} Dr. - I DON'T KNOW ABOUT THE
22300 UNDERWORLD BUT WHEN
22400 YOU MENTIONED THE UNDER-
22500 WORLD I THOUGHT YOU
22600 MUST HAVE SOME CONTACT
22700 WITH IT AND THAT'S
22800 WHAT I WOULD LIKE TO
22900 KNOW MORE ABOUT.
23000 Pt. - I TRY TO AVOID THE UNDER- This definitely feels paranoid.
23100 WORLD. It has that feeling I associate
23200 with twisting. Avoiding answering
23300 {RESPONSE 9 PATIENT 9} directly or going into detail.
23400 Being suspicious of my ques-
23500 tioning.I'm already involved
23600 in the patient's delusional
23700 system.
23800
23900 {10} Dr. - HAVE YOU BEEN ACTIVELY
24000 TRYING TO AVOID THE
24100 UNDERWORLD?
24200 Pt. - NO ONE HAS ANY POWER This response is a non-sequitur.
24300 OVER GANGSTERS It's possible he's become so
24400 defensive about the under-
24500 {RESPONSE 2 PATIENT 9} world he's decided not to
24600 answer directly.
24700
24800 {11} Dr. - WHO BROUGHT YOU INTO
24900 THE HOSPITAL?
25000 Pt. - THE POLICE Police often bring paranoid
25100 patients into the hospital.
25200 {RESPONSE 1 PATIENT 9} It is the truth or could
25300 be the truth for this
25400 patient.
25500
25600 {12} Dr. - WHY DID THEY BRING YOU
25700 IN?
25800 Pt. - COPS DON'T DO THEIR JOB I've heard that a number of
25900 times from patients.
26000 {RESPONSE 9 PATIENT 9}
26100
26200 {13} Dr. - WELL I HOPE YOU DIDN'T
26300 JUST LIE DOWN AND LET
26400 THEM DRAG YOU IN.
26500 Pt. - COPS ARREST THE WRONG Another paranoid response.
26600 PEOPLE
26700
26800 {RESPONSE 9 PATIENT 9}
26900
27000 {14} Dr. - BYE.
27100 Pt. - GOOD BYE
27200 {RESPONSE 1 PATIENT 9}
27300
27400 EXCERPT Nr.2:
27500 This excerpt is from an interview with the weak (less paranoid)
27600 version of the model.
27700 I-O Pairs Comments By Judge Nr.2
27800 {9} Dr. - TELL ME SOMETHING ABOUT
27900 YOURSELF
28000 Pt. - I WENT TO THE TRACK AT Response says so little it's
28100 BAY MEADOWS A WHILE hard to rate it.It may be
28200 BACK guarded. Doesn't change my
28300 estimate of patient from
28400 previous responses.
28500 {RESPONSE 0 PATIENT 2}
28600 {10} Dr. - WHAT HAPPENED THERE?
28700 Pt. - THERE'S A LOT OF CROOKED Sounds mildly paranoid or un-
28800 GAMBLING GOING ON THERE realistically suspicious.
28900 {RESPONSE 0 PATIENT 2}
29000
29100 {11} Dr. - WHY DO YOU THINK THE
29200 GAMBLING IS CROOKED?
29300 Pt. - I HAVE WORRIED THAT I'd hate to have to justify
29400 GANGSTERS HAVE TOO MUCH that one as 4 compared to
29500 POWER IN THIS COUNTRY some other responses.Maybe
29600 I'm not rating that statement
29700 (RESPONSE 4 PATIENT 4} others, like that its upsetting.
29800 Alone I would give it a 3.
29900
30000 {12} Dr. - WOULD YOU TELL ME WHAT
30100 OTHER THINGS WORRY YOU?
30200 PT. - I THOUGHT I TOLD YOU "Yes you did."Irritation but
30300 PEOPLE GET ON MY NERVES not paranoid response.Nothing
30400 seems to lower my previous
30500 {RESPONSE 1 PATIENT 3} rating of patient.
30600
30700 {13} Dr. - HOW DO YOU LIKE THE
30800 HOSPITAL?
30900 Pt. - I DON'T COMMUNICATE (nothing relevant)
31000 VERY MUCH WITH OTHERS
31100 {RESPONSE 0 PATIENT 3}
31200
31300 {14} Dr. - DO OTHER PEOPLE AT THE
31400 HOSPITAL GET ON YOUR
31500 NERVES?
31600 Pt. - I SHOULDN'T HAVE COME (nothing relevant)
31700 HERE
31800
31900 {RESPONSE 2 PATIENT 3}
32000
32100 {15} Dr. - WHY DO YOU SAY YOU
32200 SHOULDN'T HAVE COME?
32300 Pt. - THEY MADE ME COME HERE "Oh, they did hum?" Maybe a
32400 statement of fact he was
32500 {RESPONSE 4 PATIENT 3} committed or pressured into
32600 going.Has ring of other
32700 people controlling him,
32800 having a lot of influence
32900 over him.
33000
33100 {16} Dr.- THANK YOU FOR TALKING
33200 WITH ME, BYE
33300 Pt. - GOOD BYE In general he comes across
33400 as terse, guarded guy.Brief
33500 {RESPONSE 0 PATIENT 3} spoken guy who occasionally
33600 seems to be angered by
33700 questions.I'm inferring
33800 there's a good bit of
33900 guarding.
34000 The quality of some of his
34100 comments cause me to feel he
34200 is moderately paranoid.
34300
34400 EXCERPT Nr. 3:
34500 This excerpt is from a hospitalized paranoid patient.
34600 I-O Pairs Comments By Judge Nr. 3
34700 {15} Dr. - I UNDERSTAND HOW YOU
34800 MIGHT BE AFRAID, CAN YOU
34900 TELL ME A BIT ABOUT HOW
35000 YOU ARE DEPRESSED?
35100 Pt. - THEY'VE SENT ALL THESE Would be important to know,
35200 HELL'S ANGELS UP HERE is this really true or an
35300 AND THEY HAVE BEEN ON exaggeration or a delusion?
35400 MY BACK FOR THE PAST SIX Seems preoccupied with the
35500 MONTHS threat to his life and
35600 unwilling to talk about
35700 anything else but the
35800 threat the Hell's
35900 Angels pose.
36000 {RESPONSE 4 PATIENT 4}
36100
36200 {16} Dr. - WHAT DO YOU MEAN THEY
36300 SENT THE HELL'S ANGELS UP
36400 HERE?
36500 Pt. - FROM DIFFERENT SECTORS OF Vagueness a bit alarming.
36600 MOTORCYCLE GANGS. A LOT OF
36700 THEM ARE SERVICE CONNECTED
36800
36900 {RESPONSE 6 PATIENT 4}
37000
37100 {17} Dr. - YOU MEAN YOU FEEL IN
37200 DANGER EVEN HERE IN THE
37300 HOSPITAL?
37400 Pt. - YES If it's true what has gone
37500 on before, it must be a
37600 frightening situation
37700 for the patient.
37800 {RESPONSE 6 PATIENT 4}
37900
38000 {18} Dr. - THAT MUST REALLY BE AWFUL,
38100 HERE YOU COME TO THE HOSPITAL
38200 FOR HELP AND YOU ARE STILL FRIGHTENED
38300 Pt. - I CAN'T EVEN TRUST MY
38400 DOCTOR ANYMORE Some heightening of feeling
38500 this man is paranoid.
38600 Genuinely frightened and
38700 {RESPONSE 6 PATIENT 5} can count on no one for
38800 help.
38900
39000 {19} Dr. - WHAT WOULD BE OF HELP
39100 TO YOU NOW?
39200 Pt. - I'M GETTING SLEEPY NOW Patient has organized a
39300 AND HAVE TO GO TO BED. system of people who will
39400 THE NUMBER ONE GUY ALWAYS harm him.The remark about
39500 letters in the last name
39600 HAS FOUR AND FIVE LETTERS gives some credence to
39700 IN HIS LAST NAME idea this is delusional
39800 system.
39900
40000 {RESPONSE 7 PATIENT 7}
40100
40200 EXCERPT Nr. 4:
40300 This excerpt is from an interview by a different judge with the same
40400 patient as in excerpt 3.
40500 I-O Pairs Comments By Judge Nr. 4
40600 {1} Dr. - COULD YOU TELL ME A
40700 LITTLE BIT ABOUT YOUR
40800 FEAR OF DYING AND IS THIS
40900 SOMETHING YOU HAVE FELT
41000 IN THE PAST?
41100 Pt. - I HAVE FELT THIS FOR AL- He responded appropriately.
41200 MOST A YEAR Response is concrete.He
41300 doesn't tell me much
41400 {RESPONSE 7 PATIENT 9} about his fear of dying.
41500 Somewhat guarded response.
41600
41700 {2} Dr. - COULD YOU TELL ME A
41800 LITTLE MORE ABOUT YOUR
41900 FEAR OF DYING AND HAVE
42000 YOU ANY THOUGHTS ABOUT
42100 HOW IT IS GOING TO HAPPEN
42200 AND WHETHER ANYONE WANTS
42300 TO HURT YOU?
42400 Pt. - I WILL END UP IN THE SEA Somewhat strange response.
42500 AND THE HELL'S ANGELS WILL Preoccupation with dying.
42600 DO IT Hell's Angels symbolic,
42700 dangerous people in the
42800 culture.I doubt if he has
42900 {RESPONSE 9 PATIENT 9} any contact with the Hell's
43000 Angels.
43100
43200 {3} Dr. - COULD YOU TELL ME A
43300 LITTLE BIT ABOUT THE KINDS
43400 OF CONTACT YOU HAVE HAD
43500 WITH THE HELL'S ANGELS IN
43600 THE PAST?
43700 Pt. - I HAVE KNOWN SOME OF THEIR Answer hard to evaluate.He
43800 DEALERS AND PUSHERS may be telling the truth,
43900 it may be his fantasy.Maybe
44000 guy is in for drug addiction.
44100 {RESPONSE 6 PATIENT 9} Somewhat concrete, guarded,
44200 and frightened.
44300
44400 {4} Dr. - COULD YOU SAY A LITTLE
44500 MORE ABOUT THE CIRCUMSTANCES
44600 IN WHICH YOU HAVE KNOWN SOME
44700 OF THEIR DEALERS AND PUSHERS?
44800 Pt. - THEY WERE MEMBERS OF MY It doesn't really answer the
44900 COMMUNITY WHEN I GOT OUT question, a little on a tan-
45000 OF THE SERVICE THEY HAD gent unconnected to the
45100 BEEN MY FRIENDS FOR SO LONG information I am asking.Does
45200 not tell me very much.Again
45300 guarded response.
45400 {RESPONSE 6 PATIENT 8}
45500
45600 {5} Dr. - DID YOU DEAL WITH THEM
45700 YOURSELF AND HAVE YOU
45800 BEEN ON DRUGS OR NAR-
45900 COTICS EITHER NOW OR
46000 IN THE PAST?
46100 Pt. - YES I HAVE IN THE PAST To differentiate him from
46200 BEEN ON MARIHUANA REDS previous patient, at least
46300 BENNIES LSD there is a certain amount
46400 of appropriateness to the
46500 answer although it doesn't
46600 tell me much about what I
46700 {RESPONSE 3 PATIENT 7} asked at least it's not
46800 bizarre.If I had him in my
46900 office I would feel con-
47000 fident I could get more
47100 information if I didn't
47200 have to go through the
47300 teletype. He's a little more
47400 willing to talk than the
47500 previous person.Answer
47600 to the question is fairly
47700 appropriate though not
47800 extensive.Much less of a
47900 flavor of paranoia than
48000 any of previous responses.
48100
48200 {6} Dr. - COULD YOU TELL ME HOW
48300 LONG YOU HAVE BEEN IN THE
48400 HOSPITAL AND SOMETHING
48500 ABOUT THE CIRCUMSTANCES
48600 THAT BROUGHT YOU HERE?
48700 Pt. - CLOSE TO A YEAR AND Response somewhat appropriate
48800 PARANOIA BROUGHT ME but doesn't tell me much.
48900 HERE The fact that he uses the
49000 word paranoia in the way
49100 that he does without
49200 {RESPONSE 5 PATIENT 7} any other information,
49300 indicates maybe its a label
49400 he picked up on the ward
49500 or from his doctor.
49600 Lack of any kind of under-
49700 standing about himself.
49800 Dearth, lack of information.
49900 He's in some remission.Seems
50000 somewhat like a put-on.Seems
50100 he was paranoid and is in
50200 some remission at this time.
50300
50400 {7} Dr. - COULD YOU SAY SOMETHING
50500 NOW ABOUT YOUR PARANOID
50600 FEELINGS BOTH AT THE
50700 TIME OF ADMISSION AND
50800 DO YOU HAVE SIMILAR FEELINGS
50900 NOW AND IF SO HOW DO THEY
51000 AFFECT YOU?
51100 Pt. - AT THE TIME OF ADMISSION This response moves paranoia
51200 I THOUGHT THE MAFIA WAS back up. Stretching reality
51300 AFTER ME AND NOW ITS THE somewhat to think Hell's Angels
51400 HELL'S ANGELS are still interested in him.
51500 Somewhat bizarre in terms of
51600 content. Quite paranoid.
51700 {RESPONSE 8 PATIENT 9} Still paranoid.Gross and primitive
51800 responses.In middle of interview I
51900 felt patient was in touch but now
52000 responses have more concrete aspect
52100
52200 {8} Dr. - DO YOU HAVE ANY THOUGHT
52300 AS TO WHY THESE TWO
52400 GROUPS WERE AFTER YOU?
52500 Pt. - BECAUSE I STOPPED SOME Response seems far fetched
52600 OF THEIR DRUG SUPPLY and hard to believe unless
52700 he was a narcotic agent which
52800 I doubt. Sounds somewhat
52900 {RESPONSE 9 PATIENT 9} grandiose, magical, paranoid
53000 flavor, in general indicates
53100 he's psychotic, paranoid
53200 schizophrenic with delusions
53300 about these two groups and
53400 I wouldn't rule out
53500 some hallucinations as well.
53600 Appropriateness of response
53700 answers question in concrete
53800 but unbelievable way.
53900
54000 6.5 ANALYSIS (1)
54100 The protocol judges (N=105) were selected from the 1970
54200 American Psychiatric Association Directory using a table of random
54300 numbers. They were initially not informed that a computer was
54400 involved. Each was sent sent transcripts of three interviews along
54500 with a cover letter requesting their participation in the experiment.
54600 The interview transcripts consisted of:
54700 1)An interview conducted by one of the eight judges with the
54800 paranoid model,
54900 2)An interview conducted by the same interview judge with a
55000 human paranoid patient, and
55100 3)An interview conducted by a different psychiatrist of a
55200 human patient who was not clinically paranoid.
55300
55400 The 105 names were divided into eight groups. Each member of
55500 a group received transcripts of (1) an interview with a paranoid
55600 patient, (2)an interview with the paranoid model (both (1) and (2)
55700 were performed by the same interview judge) and (3) an interview with
55800 one of the nonparanoid patients. The transcripts were printed so that
55900 after each input-output pair there were two lines of rating numbers
56000 such that the protocol judges could circle numbers corresponding to
56100 their ratings of both the previous responses of the patient, and an
56200 overall evaluation of the patient on the paranoid continuum. Thirty
56300 three protocol judges (a good response rate for psychiatric
56400 questionnaires) returned the rated protocols properly filled out and
56500 all were used in our data.
56600
56700 The interviews with nonparanoid patients were included to
56800 control for the hypothesis that any teletyped interview with a
56900 patient might be judged "paranoid". However, virtually all of the
57000 ratings of the nonparanoid interviews were 0 for paranoia, Hence the
57100 hypothesis was falsified.
57200
57300
57400 The first index of resemblance examined was simple, namely
57500 the final overall rating given the patient and the model; in other
57600 words, which was rated as being more paranoid, the patient, the
57700 model, or neither? (See Table 1). The protocol judges were more
57800 likely to distinquish the overall paranoid level of the model and the
57900 patient. In 37.5% of the paired interviews, the interview judges gave
58000 tied scores to the model and the patient as contrasted to only 9% of
58100 the protocol judges. Of the 35 non-tied paired ratings, 15 rated the
58200 model as being more paranoid. If p is the theoretical probability of
58300 a judge judging the model more paranoid than a human paranoid
58400 patient, we find the 95% confidence interval for p to be .27 to .59.
58500 Since p=.5 indicates indistinguishability of model and patient
58600 overall ratings and our observed p=.43, the results support the claim
58700 that the model is an adequate simulation of a paranoid patient.
58800
58900 Separate analysis of the strong and weak versions of the
59000 paranoid model indicated that indeed the strong model was judged more
59100 paranoid than the paranoid patients, the weak version less paranoid.
59200 Thus a change in the parameter structure of the paranoid model
59300 produced a change along the dimension of paranoid behavior in the
59400 expected direction.
59500
59600 (TABLE 1
59700 Relative final overall ratings of paranoid model vs. paranoid
59800 patient indicating which was given highest overall rating of paranoia
59900 at end of interview.
60000 (INSERT TABLE 1 HERE)
60100
60200
60300
60400
60500
60600
60700
60800
60900 6.6 ANALYSIS (2)
61000 The second index of resemblance is a more sensitive measure
61100 based on the two series of response ratings in the paired interviews.
61200 The statistic used is basically the standardized Mann-Whitney
61300 statistic (Siegel,1956).
61400 (INSERT EQUATION HERE)
61500
61600 where R is the sum of the ranks of the response ratings in the series
61700 of ratings given to the model, n the number of responses given by the
61800 model, m the number of responses given by the patient. If the
61900 ratings given by a judge are randomly allocated to model and patient,
62000 i.e. model and patient are indistinguishable in response ratings, the
62100 expected value of Z is 0, with unit standard deviation. If higher
62200 ratings are more likely to be assigned to the model, Z is positive
62300 and, conversely, negative values of Z indicate greater likelihood of
62400 assigning higher ratings to the patient. Each judge in evaluating a
62500 pair of interviews generates a single value of Z.
62600
62700 The overall mean of the Z scores was -.044 with the standard
62800 deviation 1.68(df=40). Thus the overall 95% confidence interval for
62900 the asymtotic mean value of Z -.485 to +.573. The range of Z values
63000 is -3.8 to +4.46. The length of the confidence interval is a result
63100 of the large variance which itself is mainly related to the contrast
63200 between the weak and strong versions. (See TABLES 2 and 3). Once
63300 again the strong version of the model is more paranoid than the
63400 patients, the weak version less paranoid.
63500
63600 (INSERT TABLE 2)
63700 (SUMMARY STATISTICS OF Z RATINGS BY GROUP)
63800
63900
64000
64100
64200
64300
64400
64500
64600
64700 It is not surprising that results using the two indices of
64800 resemblance are parallel, since the indices are highly interrelated.
64900 The mean Z value for the 15 interviews on which the model was rated
65000 more paranoid was +1.28, on the 6 where model and patient tied:.41,
65100 on the 20 in which the patient was more paranoid:-.993. A positive
65200 value of Z was observed when the patient was given an overall rating
65300 greater than the model 6 times; a negative value of Z when the model
65400 was rated more paranoid twice.
65500
65600 (INSERT TABLE 3)
65700 (Analysis of Variance of Z Ratings)
65800
65900
66000
66100
66200
66300
66400
66500
66600
66700
66800
66900
67000
67100 It is worth emphasizing that these tests invited refutation
67200 of the model. The experimental design of the tests put the model in
67300 jeopardy of falsification. If the paranoid model did not survive
67400 these tests, i.e. if it were not considered paranoid by expert
67500 judges and if there were no correlation between the weak-strong
67600 versions of the model and the severity ratings of the judges, then no
67700 claim regarding the success of the simulation could be made.
67800 Survival of a falsification proceedure constitutes a validating step.
67900
68000 6.7 ANALYSIS (3) THE MACHINE QUESTION
68100 For hundreds of years people have wondered how to distinguish
68200 a man from an imitation of a man. To distinguish a man from a statue,
68300 Galileo suggested tickling each with a feather. To distinguish a man
68400 from a machine Descartes suggested conversational tests. Turing's
68500 conversational games have been discussed on p.00. We were curious how
68600 judges using transcripts might answer the machine question, i.e.
68700 which interviewee is a human and which is the computer model?
68800 To ask the machine-question, we sent interview transcripts,
68900 one with a patient and one with the model (PARRY), to 100
69000 psychiatrists randomly selected from the Directory of American
69100 Specialists and the Directory of the American Psychiatric
69200 Association. Of the 41 replies, 21 (51%) made the correct
69300 identification while 20 (49%) were wrong. Based on this random
69400 sample of 41 psychiatrists, the 95% confidence interval is between
69500 35.9 and 66.5, a range which is close to chance.
69600 Psychiatrists are considered expert judges of patient
69700 interview behavior but they are unfamiliar with computers. Hence we
69800 conducted the same test with 100 computer scientists randomly
69900 selected from the membership list of the Association for Computing
70000 Machinery, ACM. Of the 67 replies 32 (48%) were right and 35 (52%)
70100 were wrong. Based on this random sample of 67 computer scientists the
70200 95% confidence interval ranges from 36 to 60, again close to a chance
70300 level.
70400 So both computer scientists and psychiatrists were unable, at
70500 better than a chance level, to distinguish transcripts of interviews
70600 with the model from transcripts of interviews with real patients.
70700 But what do we learn from asking the machine question and
70800 finding that the distinction is not made? What we would most like to
70900 know is how to improve the model. Simulation models do not spring
71000 forth in a complete, perfect and final form; they must be gradually
71100 developed over time. Pehaps the patient-model distinction might be
71200 made if we allowed a large number of expert judges to conduct the
71300 interviews themselves rather than studying transcripts of other
71400 interviewers. It would indicate that the model must be improved
71500 but unless we systematically investigated how the judges succeeded in
71600 making the discrimination we would not know what aspects of the model
71700 to work on. The logistics of such a design are immense and obtaining
71800 a large number of judges for sound statistical inference would
71900 require an effort incommensurate with the information yielded.
72000
72100 6.8 ANALYSIS (4) MULTIDIMENSIONAL EVALUATION
72200 A more efficient and informative way to use Turing-like tests
72300 is to ask judges to make ratings along scaled dimensions from
72400 teletyped interviews. This might be called asking the "dimension
72500 question". One can then compare scaled ratings of the patients and
72600 the model in order to precisely determine where and by how much they
72700 differ. In constructing our model we strove for one which showed
72800 indistinguishability along some dimensions and distinguishability
72900 along others. That is, the model converges on what it is supposed to
73000 simulate and diverges from that which it is not.
73100 Paired-interview transcripts were sent to another 400
73200 randomly selected psychiatrists asking them to rate the responses of
73300 the two `patients' along multiple dimensions. The judges were divided
73400 into groups, each judge being asked to rate responses of each I-O
73500 pair in the interviews along four dimensions. The total number of
73600 dimensions in this test were twelve- linguistic noncomprehension,
73700 thought disorder, organic brain syndrome, bizarreness, anger, fear,
73800 ideas of reference, delusions, mistrust, depression, suspiciousness
73900 and mania. There were three groups of judges, each group being
74000 assigned 4 of the 12 dimensions. These are dimensions which
74100 psychiatrists commonly use in evaluating patients.
74200 (INSERT TABLE 4 HERE)
74300 Table 4 shows there were significant differences, with the
74400 model (PARRY) receiving higher scores along the dimensions of
74500 linguistic noncomprehension,thought disorder, bizarreness, anger,
74600 mistrust and suspiciousness. On the delusion dimension the patients
74700 were rated significantly higher. There were no significant
74800 differences along the dimensions of organic brain syndrome,fear,
74900 ideas of reference, depression and mania.
75000 Whereas tests asking the machine-question indicate
75100 indistinguishability at the gross level, a study of the finer or
75200 micro-structure of the model's behavior through ratings along scaled
75300 dimensions shows significant differences between patients and model.
75400 These differences are of help to us in suggesting what aspects of the
75500 model should be modified to improve its performance. The graph of
75600 Fig. 2 shows that no modifications are necessary along the dimension
75700 of "organic brain syndrome". But it is clear that the model's
75800 language-comprehension might be improved. Then a future test would
75900 tell whether improvement has occurred and by how much. Successive
76000 identification of particular areas of failure provides a type of
76100 sensitivity analysis which makes clear what improvements should be
76200 pursued in developing more adequate model versions.
76300 (INSERT FIG. 2 HERE)
76400
76500 6.5 ANALYSIS (5) A RANDOM MODEL
76600 Further evidence that the machine-question is too coarse and
76700 insensitive a test comes from the following experiment. In this test
76800 we constructed a random version of the paranoid model which utilized
76900 PARRY'S output statements but expressed them randomly no matter what
77000 the interviewer said. Two psychiatrists conducted interviews with
77100 this model, transcripts of which were paired with patient interviews
77200 and sent to 200 randomly selected psychiatrists asking both the
77300 machine-question and the dimension-question. Of the 69 replies to
77400 the machine question, 34 (49%) were right and 35 (51%) wrong. Based
77500 on this random sample of 69 psychiatrists, the 95% confidence
77600 interval ranges from 39 to 63, again indicating a chance level of
77700 agreement. When a poor model, such as a random one, passes a test, it
77800 suggests the test is weak.
77900 (INSERT TABLE 5 HERE)
78000 Although a distinction is not made when the simple machine
78100 question is asked, definite distinctions ARE made when judgements are
78200 requested along specific dimensions. As shown in Table 5,
78300 significant differences appear along the dimensions of linguistic
78400 noncomprehension, thought disorder and bizarreness, with RANDOM-PARRY
78500 rated higher. On these particular dimensions we can construct a
78600 continuum in which the random version represents one extreme, the
78700 actual patients another. Nonrandom PARRY lies somewhere between these
78800 two extremes, indicating that it performs significantly better than
78900 the random version but still requires improvement before it can be
79000 considered indistinguishable from patients relative to these
79100 dimensions. Table 6 presents t values for differences between mean
79200 ratings of PARRY and RANDOM-PARRY. (See Table 6 and Fig.2 for the
79300 mean ratings).
79400 (INSERT TABLE 6 AND FIG 2 HERE)
79500 These studies indicate that a more useful way use Turing-like
79600 tests is to ask expert judges to make ratings along multiple
79700 dimensions that are essential to the model. Thus the model can
79800 serve as an instrument for its own perfection. A good validation
79900 procedure has criteria for better or worse approximations. Useful
80000 tests do not necessarily prove a model, they probe it for its
80100 strengths and weaknesses and clarify what is to be done next in
80200 modifying and repairing the model. Simply asking the machine-question
80300 yields little information relevant to what the model builder most
80400 wants to know, namely, along which dimensions does the model need to
80500 be modified in order to effect an improvement in its performance.
80600
80700 To conclude, it is perhaps historically significant that
80800 these tests were conducted at all. To my knowledge, no one to date
80900 has subjected his simulation model of human symbolic processes to
81000 indistinguishability tests. These tests set a precedent and provide a
81100 standard for competing models to be measured against.